<!DOCTYPE html>
<html lang="si">
<head>
    <meta http-equiv="content-type" content="text/html;charset=utf-8"/>
    <meta name="viewport" content="width=device-width, initial-scale=1.0"/>
    <meta name="description" content="මෙය PyTorch හි ස්විච් ට්රාන්ස්ෆෝමර් හි කුඩා අනුවාදයකි."/>

    <meta name="twitter:card" content="summary"/>
    <meta name="twitter:image:src" content="https://avatars1.githubusercontent.com/u/64068543?s=400&amp;v=4"/>
    <meta name="twitter:title" content="ට්රාන්ස්ෆෝමර් ස්විචය"/>
    <meta name="twitter:description" content="මෙය PyTorch හි ස්විච් ට්රාන්ස්ෆෝමර් හි කුඩා අනුවාදයකි."/>
    <meta name="twitter:site" content="@labmlai"/>
    <meta name="twitter:creator" content="@labmlai"/>

    <meta property="og:url" content="https://nn.labml.ai/transformers/switch/index.html"/>
    <meta property="og:title" content="ට්රාන්ස්ෆෝමර් ස්විචය"/>
    <meta property="og:image" content="https://avatars1.githubusercontent.com/u/64068543?s=400&amp;v=4"/>
    <meta property="og:site_name" content="ට්රාන්ස්ෆෝමර් ස්විචය"/>
    <meta property="og:type" content="object"/>
    <meta property="og:title" content="ට්රාන්ස්ෆෝමර් ස්විචය"/>
    <meta property="og:description" content="මෙය PyTorch හි ස්විච් ට්රාන්ස්ෆෝමර් හි කුඩා අනුවාදයකි."/>

    <title>ට්රාන්ස්ෆෝමර් ස්විචය</title>
    <link rel="shortcut icon" href="/icon.png"/>
    <link rel="stylesheet" href="../../pylit.css?v=1">
    <link rel="canonical" href="https://nn.labml.ai/transformers/switch/index.html"/>
    <link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/katex@0.13.18/dist/katex.min.css" integrity="sha384-zTROYFVGOfTw7JV7KUu8udsvW2fx4lWOsCEDqhBreBwlHI4ioVRtmIvEThzJHGET" crossorigin="anonymous">

    <!-- Global site tag (gtag.js) - Google Analytics -->
    <script async src="https://www.googletagmanager.com/gtag/js?id=G-4V3HC8HBLH"></script>
    <script>
        window.dataLayer = window.dataLayer || [];

        function gtag() {
            dataLayer.push(arguments);
        }

        gtag('js', new Date());

        gtag('config', 'G-4V3HC8HBLH');
    </script>
</head>
<body>
<div id='container'>
    <div id="background"></div>
    <div class='section'>
        <div class='docs'>
            <p>
                <a class="parent" href="/">home</a>
                <a class="parent" href="../index.html">transformers</a>
                <a class="parent" href="index.html">switch</a>
            </p>
            <p>
                <a href="https://github.com/labmlai/annotated_deep_learning_paper_implementations" target="_blank">
                    <img alt="Github"
                         src="https://img.shields.io/github/stars/labmlai/annotated_deep_learning_paper_implementations?style=social"
                         style="max-width:100%;"/></a>
                <a href="https://twitter.com/labmlai" rel="nofollow" target="_blank">
                    <img alt="Twitter"
                         src="https://img.shields.io/twitter/follow/labmlai?style=social"
                         style="max-width:100%;"/></a>
            </p>
            <p>
                <a href="https://github.com/labmlai/annotated_deep_learning_paper_implementations/tree/master/labml_nn/transformers/switch/__init__.py" target="_blank">
                    View code on Github</a>
            </p>
        </div>
    </div>
    <div class='section' id='section-0'>
        <div class='docs doc-strings'>
            <div class='section-link'>
                <a href='#section-0'>#</a>
            </div>
            <h1>ට්රාන්ස්ෆෝමර්ස්විචය</h1>
<p>මෙයකඩදාසි <a href="https://papers.labml.ai/paper/2101.03961">ස්විච් ට්රාන්ස්ෆෝමර්වල කුඩා <a href="https://pytorch.org">පයිටෝච්</a> ක්රියාත්මක කිරීමකි: සරල හා කාර්යක්ෂම ස්පාර්ටිටි සහිත ට්රිලියන පරාමිති ආකෘති වලට පරිමාණය</a> කිරීම. අපගේ ක්රියාත්මක කිරීම සඳහා ඇත්තේ පරාමිතීන් මිලියන කිහිපයක් පමණක් වන අතර සමාන්තරව බෙදා හරින ලද පුහුණුව ආදර්ශයට නොගනී. එය තනි GPU පුහුණුව කරන්නේ, නමුත් අපි කඩදාසි විස්තර කර ඇති පරිදි මාරුවීමේ සංකල්පය ක්රියාත්මක කරමු. </p>
<p>ස්විච්ට්රාන්ස්ෆෝමරය ටෝකනය මත පදනම්ව පරාමිතීන් අතර මාරුවීමෙන් එක් එක් ටෝකනය සඳහා විවිධ පරාමිතීන් භාවිතා කරයි. එබැවින්, එක් එක් ටෝකනය සඳහා තෝරා ගනු ලබන්නේ පරාමිතීන්ගෙන් කොටසක් පමණි. එබැවින් ඔබට වැඩි පරාමිතීන් තිබිය හැකි නමුත් අඩු පරිගණකමය පිරිවැයක් ඇත. </p>
<p>මාරුවීමසිදුවන්නේ එක් එක් ට්රාන්ස්ෆෝමර් බ්ලොක් එකේ ස්ථාන-නැණවත් Feedforward ජාලයේ (FFN) ය. ස්ථාන-නැණවත් පෝෂක ජාලය අනුක්රමිකව පූර්ණ සම්බන්ධිත ස්ථර දෙකකින් සමන්විත වේ. ස්විච් ට්රාන්ස්ෆෝමරයේ අපට FFNs (බහු විශේෂ experts යින්) කිහිපයක් ඇති අතර, රවුටරයක් මත පදනම්ව භාවිතා කළ යුත්තේ කුමන එකද යන්න අපි තෝරා ගත්තෙමු. ප්රතිදානය යනු එෆ්එෆ්එන් තෝරා ගැනීම සඳහා වන සම්භාවිතාවන් සමූහයක් වන අතර, අපි ඉහළම සම්භාවිතාව ඇති එකක් තෝරාගෙන එය ඇගයීමට ලක් කරමු. එබැවින් අත්යවශ්යයෙන්ම පරිගණකමය පිරිවැය තනි එෆ්එෆ්එන් එකක් තිබීම හා සමාන වේ. අපගේ ක්රියාත්මක කිරීමේදී මෙය ඔබට බොහෝ හෝ විශාල එෆ්එෆ්එන්එස් ඇති විට සමාන්තරගත නොවේ. බෙදා හරින ලද සැකසුමක ඔබට එක් එක් FFN (සෑම ඉතා විශාල) වෙනත් උපාංගයක ඇත. </p>
<p>විශේෂexperts යන් (එෆ්එෆ්එන්එස්) අතර බර සමතුලිත කිරීම සඳහා කඩදාසි තවත් පාඩු යෙදුමක් හඳුන්වා දෙන අතර රවුටින් සමතුලිත නොවන විට ටෝකන අතහැර දැමීම සාකච්ඡා කරයි. </p>
<p>කුඩාෂේක්ස්පියර් දත්ත කට්ටුවේ ස්විච් ට්රාන්ස්ෆෝමරයක් පුහුණු කිරීම සඳහා පුහුණු <a href="experiment.html">කේතය</a> සහ සටහන් පොතක් මෙන්න. </p>
<p><a href="https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/transformers/switch/experiment.ipynb"><img alt="Open In Colab" src="https://colab.research.google.com/assets/colab-badge.svg"></a> <a href="https://app.labml.ai/run/353770ce177c11ecaa5fb74452424f46"> <img alt="View Run" src="https://img.shields.io/badge/labml-experiment-brightgreen"></a></p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">40</span><span></span><span class="kn">import</span> <span class="nn">torch</span>
<span class="lineno">41</span><span class="kn">from</span> <span class="nn">torch</span> <span class="kn">import</span> <span class="n">nn</span>
<span class="lineno">42</span>
<span class="lineno">43</span><span class="kn">from</span> <span class="nn">labml_helpers.module</span> <span class="kn">import</span> <span class="n">Module</span>
<span class="lineno">44</span><span class="kn">from</span> <span class="nn">labml_nn.transformers.feed_forward</span> <span class="kn">import</span> <span class="n">FeedForward</span>
<span class="lineno">45</span><span class="kn">from</span> <span class="nn">labml_nn.transformers.mha</span> <span class="kn">import</span> <span class="n">MultiHeadAttention</span>
<span class="lineno">46</span><span class="kn">from</span> <span class="nn">labml_nn.utils</span> <span class="kn">import</span> <span class="n">clone_module_list</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-1'>
        <div class='docs doc-strings'>
            <div class='section-link'>
                <a href='#section-1'>#</a>
            </div>
            <h2>බහුFFNs අතර මාර්ගගත කිරීම</h2>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">49</span><span class="k">class</span> <span class="nc">SwitchFeedForward</span><span class="p">(</span><span class="n">Module</span><span class="p">):</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-2'>
        <div class='docs doc-strings'>
            <div class='section-link'>
                <a href='#section-2'>#</a>
            </div>
            <ul><li><code  class="highlight"><span></span><span class="n">capacity_factor</span></code>
 පරමාදර්ශී සමබර බරට සාපේක්ෂව සාධකයක් ලෙස එක් එක් විශේෂ expert යාගේ ධාරිතාවයි </li>
<li><code  class="highlight"><span></span><span class="n">drop_tokens</span></code>
 ධාරිතාවයට වඩා වැඩි ටෝකන විශේෂ expert යෙකු වෙත යොමු කළහොත් ටෝකන අතහැර දැමිය යුතුද යන්න නියම කරයි </li>
<li><code  class="highlight"><span></span><span class="n">is_scale_prob</span></code>
 රවුටින් සම්භාවිතාව අනුව එෆ්එෆ්එන් වෙත ආදානය ගුණ කළ යුතුද යන්න නියම කරයි </li>
<li><code  class="highlight"><span></span><span class="n">n_experts</span></code>
 විශේෂඥයන් සංඛ්යාව </li>
<li><code  class="highlight"><span></span><span class="n">expert</span></code>
 විශේෂ expert ස්තරය, <a href="../feed_forward.html">එෆ්එෆ්එන් මොඩියුලයකි</a> </li>
<li><code  class="highlight"><span></span><span class="n">d_model</span></code>
 යනු ටෝකන කාවැද්දීම තුළ ඇති විශේෂාංග ගණන </li>
<li><code  class="highlight"><span></span><span class="n">d_ff</span></code>
 යනු FFN හි සැඟවුණු ස්ථරයේ ඇති ලක්ෂණ ගණන </li>
<li><code  class="highlight"><span></span><span class="n">dropout</span></code>
 FFN හි අතහැර දැමීමේ සම්භාවිතාව</li></ul>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">54</span>    <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="o">*</span><span class="p">,</span>
<span class="lineno">55</span>                 <span class="n">capacity_factor</span><span class="p">:</span> <span class="nb">float</span><span class="p">,</span>
<span class="lineno">56</span>                 <span class="n">drop_tokens</span><span class="p">:</span> <span class="nb">bool</span><span class="p">,</span>
<span class="lineno">57</span>                 <span class="n">is_scale_prob</span><span class="p">:</span> <span class="nb">bool</span><span class="p">,</span>
<span class="lineno">58</span>                 <span class="n">n_experts</span><span class="p">:</span> <span class="nb">int</span><span class="p">,</span>
<span class="lineno">59</span>                 <span class="n">expert</span><span class="p">:</span> <span class="n">FeedForward</span><span class="p">,</span>
<span class="lineno">60</span>                 <span class="n">d_model</span><span class="p">:</span> <span class="nb">int</span><span class="p">):</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-3'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-3'>#</a>
            </div>
            
        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">71</span>        <span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="fm">__init__</span><span class="p">()</span>
<span class="lineno">72</span>
<span class="lineno">73</span>        <span class="bp">self</span><span class="o">.</span><span class="n">capacity_factor</span> <span class="o">=</span> <span class="n">capacity_factor</span>
<span class="lineno">74</span>        <span class="bp">self</span><span class="o">.</span><span class="n">is_scale_prob</span> <span class="o">=</span> <span class="n">is_scale_prob</span>
<span class="lineno">75</span>        <span class="bp">self</span><span class="o">.</span><span class="n">n_experts</span> <span class="o">=</span> <span class="n">n_experts</span>
<span class="lineno">76</span>        <span class="bp">self</span><span class="o">.</span><span class="n">drop_tokens</span> <span class="o">=</span> <span class="n">drop_tokens</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-4'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-4'>#</a>
            </div>
            <p>FFNsහි පිටපත් සාදන්න </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">79</span>        <span class="bp">self</span><span class="o">.</span><span class="n">experts</span> <span class="o">=</span> <span class="n">clone_module_list</span><span class="p">(</span><span class="n">expert</span><span class="p">,</span> <span class="n">n_experts</span><span class="p">)</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-5'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-5'>#</a>
            </div>
            <p>රවුටින්ස්තරය සහ සොෆ්ට්මැක්ස් </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">81</span>        <span class="bp">self</span><span class="o">.</span><span class="n">switch</span> <span class="o">=</span> <span class="n">nn</span><span class="o">.</span><span class="n">Linear</span><span class="p">(</span><span class="n">d_model</span><span class="p">,</span> <span class="n">n_experts</span><span class="p">)</span>
<span class="lineno">82</span>        <span class="bp">self</span><span class="o">.</span><span class="n">softmax</span> <span class="o">=</span> <span class="n">nn</span><span class="o">.</span><span class="n">Softmax</span><span class="p">(</span><span class="n">dim</span><span class="o">=-</span><span class="mi">1</span><span class="p">)</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-6'>
        <div class='docs doc-strings'>
            <div class='section-link'>
                <a href='#section-6'>#</a>
            </div>
            <ul><li><code  class="highlight"><span></span><span class="n">x</span></code>
 හැඩය සහිත මාරුවීමේ මොඩියුලයට ආදානය වේ <code  class="highlight"><span></span><span class="p">[</span><span class="n">seq_len</span><span class="p">,</span> <span class="n">batch_size</span><span class="p">,</span> <span class="n">d_model</span><span class="p">]</span></code>
</li></ul>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">84</span>    <span class="k">def</span> <span class="nf">forward</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">x</span><span class="p">:</span> <span class="n">torch</span><span class="o">.</span><span class="n">Tensor</span><span class="p">):</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-7'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-7'>#</a>
            </div>
            <p>පසුවහැඩයන් වෙනස් කිරීම සඳහා හැඩය අල්ලා ගන්න </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">90</span>        <span class="n">seq_len</span><span class="p">,</span> <span class="n">batch_size</span><span class="p">,</span> <span class="n">d_model</span> <span class="o">=</span> <span class="n">x</span><span class="o">.</span><span class="n">shape</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-8'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-8'>#</a>
            </div>
            <p>අනුක්රමයසහ කණ්ඩායම් මානයන් සමතලා කරන්න </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">92</span>        <span class="n">x</span> <span class="o">=</span> <span class="n">x</span><span class="o">.</span><span class="n">view</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="n">d_model</span><span class="p">)</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-9'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-9'>#</a>
            </div>
            <p>එක්එක් ටෝකන සඳහා මාර්ගගත කිරීමේ සම්භාවිතාව ලබා ගන්න. <span ><span class="katex-display"><span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord"><span class="mord mathnormal">p</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.31166399999999994em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">i</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mopen">(</span><span class="mord mathnormal">x</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:2.872049em;vertical-align:-1.3070490000000001em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.565em;"><span style="top:-2.128769em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mop"><span class="mop op-symbol small-op" style="position:relative;top:-0.0000050000000000050004em;">∑</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.981231em;"><span style="top:-2.40029em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.05724em;">j</span></span></span><span style="top:-3.2029em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight coloredeq eqf" style=""><span class="mord mathnormal mtight" style="margin-right:0.10903em">N</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.43581800000000004em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mord mathnormal">e</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8300699999999999em;"><span style="top:-3.0050700000000004em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">h</span><span class="mopen mtight">(</span><span class="mord mathnormal mtight">x</span><span class="mclose mtight"><span class="mclose mtight">)</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3280857142857143em;"><span style="top:-2.357em;margin-left:0em;margin-right:0.07142857142857144em;"><span class="pstrut" style="height:2.5em;"></span><span class="sizing reset-size3 size1 mtight"><span class="mord mathnormal mtight" style="margin-right:0.05724em;">j</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2818857142857143em;"><span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.677em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord"><span class="mord mathnormal">e</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8879999999999999em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">h</span><span class="mopen mtight">(</span><span class="mord mathnormal mtight">x</span><span class="mclose mtight"><span class="mclose mtight">)</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3280857142857143em;"><span style="top:-2.357em;margin-left:0em;margin-right:0.07142857142857144em;"><span class="pstrut" style="height:2.5em;"></span><span class="sizing reset-size3 size1 mtight"><span class="mord mathnormal mtight">i</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.143em;"><span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:1.3070490000000001em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span></span></span></span></span></span> විශේෂ experts යන්ගේ සංඛ්යාව <span ><span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:0.68333em;vertical-align:0em;"></span><span class="mord coloredeq eqf" style=""><span class="mord mathnormal" style="margin-right:0.10903em">N</span></span></span></span></span></span> <code  class="highlight"><span></span><span class="n">n_experts</span></code>
 කොතැනද සහ <span ><span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathnormal">h</span><span class="mopen">(</span><span class="mord">⋅</span><span class="mclose">)</span></span></span></span></span> ටෝකන් කාවැද්දීම් රේඛීය පරිවර්තනය වේ. </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">98</span>        <span class="n">route_prob</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">softmax</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">switch</span><span class="p">(</span><span class="n">x</span><span class="p">))</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-10'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-10'>#</a>
            </div>
            <p>උපරිමමාර්ගගත කිරීමේ සම්භාවිතාව සහ මාර්ග ලබා ගන්න. අපි ඉහළම සම්භාවිතාව සහිත විශේෂ expert යා වෙත ගමන් කරමු </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">102</span>        <span class="n">route_prob_max</span><span class="p">,</span> <span class="n">routes</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">max</span><span class="p">(</span><span class="n">route_prob</span><span class="p">,</span> <span class="n">dim</span><span class="o">=-</span><span class="mi">1</span><span class="p">)</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-11'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-11'>#</a>
            </div>
            <p>එක්එක් විශේෂ. යා වෙත යන ටෝකන වල දර්ශක ලබා ගන්න </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">105</span>        <span class="n">indexes_list</span> <span class="o">=</span> <span class="p">[</span><span class="n">torch</span><span class="o">.</span><span class="n">eq</span><span class="p">(</span><span class="n">routes</span><span class="p">,</span> <span class="n">i</span><span class="p">)</span><span class="o">.</span><span class="n">nonzero</span><span class="p">(</span><span class="n">as_tuple</span><span class="o">=</span><span class="kc">True</span><span class="p">)[</span><span class="mi">0</span><span class="p">]</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">n_experts</span><span class="p">)]</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-12'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-12'>#</a>
            </div>
            <p>ප්රතිදානයන්ගබඩා කිරීම සඳහා හිස් ටෙන්සරයක් ආරම්භ කරන්න </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">108</span>        <span class="n">final_output</span> <span class="o">=</span> <span class="n">x</span><span class="o">.</span><span class="n">new_zeros</span><span class="p">(</span><span class="n">x</span><span class="o">.</span><span class="n">shape</span><span class="p">)</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-13'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-13'>#</a>
            </div>
            <p>එක්එක් විශේෂඥයාගේ ධාරිතාව. <span ><span class="katex-display"><span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:0.8623000000000001em;vertical-align:-0.19444em;"></span><span class="mord"><span class="mord mathrm">expert</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mord mathrm" style="margin-right:0.01389em;">capacity</span></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:2.25188em;vertical-align:-0.8804400000000001em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.3714399999999998em;"><span style="top:-2.314em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord"><span class="mord mathrm">number</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mord mathrm" style="margin-right:0.07778em;">of</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mord mathrm">experts</span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.677em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord"><span class="mord mathrm">tokens</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mord mathrm">per</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mord mathrm">batch</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.8804400000000001em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">×</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:0.8888799999999999em;vertical-align:-0.19444em;"></span><span class="mord"><span class="mord mathrm" style="margin-right:0.01389em;">capacity</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mord mathrm">factor</span></span></span></span></span></span></span> </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">114</span>        <span class="n">capacity</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">capacity_factor</span> <span class="o">*</span> <span class="nb">len</span><span class="p">(</span><span class="n">x</span><span class="p">)</span> <span class="o">/</span> <span class="bp">self</span><span class="o">.</span><span class="n">n_experts</span><span class="p">)</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-14'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-14'>#</a>
            </div>
            <p>එක්එක් විශේෂ. යා වෙත යොමු කරන ලද ටෝකන ගණන. </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">116</span>        <span class="n">counts</span> <span class="o">=</span> <span class="n">x</span><span class="o">.</span><span class="n">new_tensor</span><span class="p">([</span><span class="nb">len</span><span class="p">(</span><span class="n">indexes_list</span><span class="p">[</span><span class="n">i</span><span class="p">])</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">n_experts</span><span class="p">)])</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-15'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-15'>#</a>
            </div>
            <p>පහතවැටුණු ටෝකන වල හිස් ලැයිස්තුවක් ආරම්භ කරන්න </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">119</span>        <span class="n">dropped</span> <span class="o">=</span> <span class="p">[]</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-16'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-16'>#</a>
            </div>
            <p>නම්පමණක් ටෝකන <code  class="highlight"><span></span><span class="n">drop_tokens</span></code>
 අතහරින්න <code  class="highlight"><span></span><span class="kc">True</span></code>
. </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">121</span>        <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">drop_tokens</span><span class="p">:</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-17'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-17'>#</a>
            </div>
            <p>එක්එක් විශේෂ. යන් තුළ ටෝකන අතහරින්න </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">123</span>            <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">n_experts</span><span class="p">):</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-18'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-18'>#</a>
            </div>
            <p>විශේෂexpert යා ධාරිතාව වැඩි නොවේ නම් නොසලකා හරින්න </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">125</span>                <span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">indexes_list</span><span class="p">[</span><span class="n">i</span><span class="p">])</span> <span class="o">&lt;=</span> <span class="n">capacity</span><span class="p">:</span>
<span class="lineno">126</span>                    <span class="k">continue</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-19'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-19'>#</a>
            </div>
            <p>අතහැරදැමීමට පෙර දර්ශක කලවම් කරන්න </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">128</span>                <span class="n">indexes_list</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">indexes_list</span><span class="p">[</span><span class="n">i</span><span class="p">][</span><span class="n">torch</span><span class="o">.</span><span class="n">randperm</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">indexes_list</span><span class="p">[</span><span class="n">i</span><span class="p">]))]</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-20'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-20'>#</a>
            </div>
            <p>පහතවැටුණු ටෝකන ලෙස ධාරිතාවයට වඩා ටෝකන එකතු කරන්න </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">130</span>                <span class="n">dropped</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">indexes_list</span><span class="p">[</span><span class="n">i</span><span class="p">][</span><span class="n">capacity</span><span class="p">:])</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-21'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-21'>#</a>
            </div>
            <p>විශේෂexpert යාගේ ධාරිතාව දක්වා ටෝකන පමණක් තබා ගන්න </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">132</span>                <span class="n">indexes_list</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">indexes_list</span><span class="p">[</span><span class="n">i</span><span class="p">][:</span><span class="n">capacity</span><span class="p">]</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-22'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-22'>#</a>
            </div>
            <p>විශේෂexpert එෆ්එෆ්එන්එස් හි ප්රතිදානයන් ලබා ගන්න </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">135</span>        <span class="n">expert_output</span> <span class="o">=</span> <span class="p">[</span><span class="bp">self</span><span class="o">.</span><span class="n">experts</span><span class="p">[</span><span class="n">i</span><span class="p">](</span><span class="n">x</span><span class="p">[</span><span class="n">indexes_list</span><span class="p">[</span><span class="n">i</span><span class="p">],</span> <span class="p">:])</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">n_experts</span><span class="p">)]</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-23'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-23'>#</a>
            </div>
            <p>අවසානනිමැවුමට පවරන්න </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">138</span>        <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">n_experts</span><span class="p">):</span>
<span class="lineno">139</span>            <span class="n">final_output</span><span class="p">[</span><span class="n">indexes_list</span><span class="p">[</span><span class="n">i</span><span class="p">],</span> <span class="p">:]</span> <span class="o">=</span> <span class="n">expert_output</span><span class="p">[</span><span class="n">i</span><span class="p">]</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-24'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-24'>#</a>
            </div>
            <p>පහතවැටුණු ටෝකන හරහා යන්න </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">142</span>        <span class="k">if</span> <span class="n">dropped</span><span class="p">:</span>
<span class="lineno">143</span>            <span class="n">dropped</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">cat</span><span class="p">(</span><span class="n">dropped</span><span class="p">)</span>
<span class="lineno">144</span>            <span class="n">final_output</span><span class="p">[</span><span class="n">dropped</span><span class="p">,</span> <span class="p">:]</span> <span class="o">=</span> <span class="n">x</span><span class="p">[</span><span class="n">dropped</span><span class="p">,</span> <span class="p">:]</span>
<span class="lineno">145</span>
<span class="lineno">146</span>        <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">is_scale_prob</span><span class="p">:</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-25'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-25'>#</a>
            </div>
            <p>සම්භාවිතාවවිසින් විශේෂඥ ප්රතිදානයන් විසින් ගුණ <span ><span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:0.625em;vertical-align:-0.19444em;"></span><span class="mord mathnormal" style="margin-right:0.03588em;">y</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord"><span class="mord mathnormal">p</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.31166399999999994em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">i</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mopen">(</span><span class="mord mathnormal">x</span><span class="mclose">)</span><span class="mord"><span class="mord mathnormal" style="margin-right:0.05764em;">E</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.31166399999999994em;"><span style="top:-2.5500000000000003em;margin-left:-0.05764em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">i</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mopen">(</span><span class="mord mathnormal">x</span><span class="mclose">)</span></span></span></span></span> </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">148</span>            <span class="n">final_output</span> <span class="o">=</span> <span class="n">final_output</span> <span class="o">*</span> <span class="n">route_prob_max</span><span class="o">.</span><span class="n">view</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
<span class="lineno">149</span>        <span class="k">else</span><span class="p">:</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-26'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-26'>#</a>
            </div>
            <p>අගයන්පරිමාණය නොකරන්න, <span ><span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:1.228608em;vertical-align:-0.481108em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.7475em;"><span style="top:-2.6550000000000002em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord accent mtight"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.69444em;"><span style="top:-2.7em;"><span class="pstrut" style="height:2.7em;"></span><span class="mord mathnormal mtight">p</span></span><span style="top:-2.7em;"><span class="pstrut" style="height:2.7em;"></span><span class="accent-body" style="left:-0.16666em;"><span class="mord mtight">^</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.19444em;"><span></span></span></span></span></span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.446108em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">p</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.481108em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:0.64444em;vertical-align:0em;"></span><span class="mord">1</span></span></span></span></span> එවිට අනුක්රමික ගලා යන පරිදි ගුණ කරන්න (මෙය අප අත්හදා බැලූ දෙයකි). </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">152</span>            <span class="n">final_output</span> <span class="o">=</span> <span class="n">final_output</span> <span class="o">*</span> <span class="p">(</span><span class="n">route_prob_max</span> <span class="o">/</span> <span class="n">route_prob_max</span><span class="o">.</span><span class="n">detach</span><span class="p">())</span><span class="o">.</span><span class="n">view</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-27'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-27'>#</a>
            </div>
            <p>අවසානනිමැවුමේ හැඩය නැවත වෙනස් කරන්න <code  class="highlight"><span></span><span class="p">[</span><span class="n">seq_len</span><span class="p">,</span> <span class="n">batch_size</span><span class="p">,</span> <span class="n">d_model</span><span class="p">]</span></code>
 </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">155</span>        <span class="n">final_output</span> <span class="o">=</span> <span class="n">final_output</span><span class="o">.</span><span class="n">view</span><span class="p">(</span><span class="n">seq_len</span><span class="p">,</span> <span class="n">batch_size</span><span class="p">,</span> <span class="n">d_model</span><span class="p">)</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-28'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-28'>#</a>
            </div>
            <p>ආපසු</p>
<ul><li>අවසානප්රතිදානය </li>
<li>එක්එක් විශේෂ. යා වෙත යොමු කරන ලද ටෝකන ගණන </li>
<li>එක්එක් විශේෂ. යා සඳහා සම්භාවිතාවන් එකතුව </li>
<li>ටෝකනගණන පහත වැටුණි. </li>
<li>තෝරාගත්විශේෂඥයින්ගේ සම්භාවිතාවන්</li></ul>
<p>බරතුලනය කිරීමේ අලාභය සහ ලොග් වීම සඳහා මේවා භාවිතා වේ </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">166</span>        <span class="k">return</span> <span class="n">final_output</span><span class="p">,</span> <span class="n">counts</span><span class="p">,</span> <span class="n">route_prob</span><span class="o">.</span><span class="n">sum</span><span class="p">(</span><span class="mi">0</span><span class="p">),</span> <span class="nb">len</span><span class="p">(</span><span class="n">dropped</span><span class="p">),</span> <span class="n">route_prob_max</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-29'>
        <div class='docs doc-strings'>
            <div class='section-link'>
                <a href='#section-29'>#</a>
            </div>
            <h1>ස්විච්ට්රාන්ස්ෆෝමර් බ්ලොක්</h1>
<p>ස්විච්පෝෂක මොඩියුලයේ අමතර ප්රතිදානයන් හැසිරවීමත් සමඟ මෙය <a href="../models.html#TransformerLayer">සාමාන්ය ට්රාන්ස්ෆෝමර් කොටුවට</a> සමාන වේ. </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">169</span><span class="k">class</span> <span class="nc">SwitchTransformerLayer</span><span class="p">(</span><span class="n">Module</span><span class="p">):</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-30'>
        <div class='docs doc-strings'>
            <div class='section-link'>
                <a href='#section-30'>#</a>
            </div>
            <ul><li><code  class="highlight"><span></span><span class="n">d_model</span></code>
 ටෝකනය කාවැද්දීමේ ප්රමාණයයි </li>
<li><code  class="highlight"><span></span><span class="n">attn</span></code>
 අවධානය යොමු මොඩියුලය වේ </li>
<li><code  class="highlight"><span></span><span class="n">feed_forward</span></code>
 යනු ආහාර ඉදිරි මොඩියුලයයි (මෙම නඩුවේ මාරුවීමේ මොඩියුලය වන) </li>
<li><code  class="highlight"><span></span><span class="n">dropout_prob</span></code>
 ස්වයං අවධානයෙන් පසු ඉවත් වීමේ සම්භාවිතාව සහ FFN</li></ul>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">177</span>    <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="o">*</span><span class="p">,</span>
<span class="lineno">178</span>                 <span class="n">d_model</span><span class="p">:</span> <span class="nb">int</span><span class="p">,</span>
<span class="lineno">179</span>                 <span class="n">attn</span><span class="p">:</span> <span class="n">MultiHeadAttention</span><span class="p">,</span>
<span class="lineno">180</span>                 <span class="n">feed_forward</span><span class="p">:</span> <span class="n">SwitchFeedForward</span><span class="p">,</span>
<span class="lineno">181</span>                 <span class="n">dropout_prob</span><span class="p">:</span> <span class="nb">float</span><span class="p">):</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-31'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-31'>#</a>
            </div>
            
        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">188</span>        <span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="fm">__init__</span><span class="p">()</span>
<span class="lineno">189</span>        <span class="bp">self</span><span class="o">.</span><span class="n">size</span> <span class="o">=</span> <span class="n">d_model</span>
<span class="lineno">190</span>        <span class="bp">self</span><span class="o">.</span><span class="n">attn</span> <span class="o">=</span> <span class="n">attn</span>
<span class="lineno">191</span>        <span class="bp">self</span><span class="o">.</span><span class="n">feed_forward</span> <span class="o">=</span> <span class="n">feed_forward</span>
<span class="lineno">192</span>        <span class="bp">self</span><span class="o">.</span><span class="n">dropout</span> <span class="o">=</span> <span class="n">nn</span><span class="o">.</span><span class="n">Dropout</span><span class="p">(</span><span class="n">dropout_prob</span><span class="p">)</span>
<span class="lineno">193</span>        <span class="bp">self</span><span class="o">.</span><span class="n">norm_self_attn</span> <span class="o">=</span> <span class="n">nn</span><span class="o">.</span><span class="n">LayerNorm</span><span class="p">([</span><span class="n">d_model</span><span class="p">])</span>
<span class="lineno">194</span>        <span class="bp">self</span><span class="o">.</span><span class="n">norm_ff</span> <span class="o">=</span> <span class="n">nn</span><span class="o">.</span><span class="n">LayerNorm</span><span class="p">([</span><span class="n">d_model</span><span class="p">])</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-32'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-32'>#</a>
            </div>
            
        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">196</span>    <span class="k">def</span> <span class="nf">forward</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="o">*</span><span class="p">,</span>
<span class="lineno">197</span>                <span class="n">x</span><span class="p">:</span> <span class="n">torch</span><span class="o">.</span><span class="n">Tensor</span><span class="p">,</span>
<span class="lineno">198</span>                <span class="n">mask</span><span class="p">:</span> <span class="n">torch</span><span class="o">.</span><span class="n">Tensor</span><span class="p">):</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-33'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-33'>#</a>
            </div>
            <p>ස්වයංඅවධානය යොමු කිරීමට පෙර දෛශික සාමාන්යකරණය කරන්න </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">200</span>        <span class="n">z</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">norm_self_attn</span><span class="p">(</span><span class="n">x</span><span class="p">)</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-34'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-34'>#</a>
            </div>
            <p>ස්වයංඅවධානය හරහා ධාවනය කරන්න, i.e. යතුරු සහ වටිනාකම් ස්වයං සිට </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">202</span>        <span class="n">self_attn</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">attn</span><span class="p">(</span><span class="n">query</span><span class="o">=</span><span class="n">z</span><span class="p">,</span> <span class="n">key</span><span class="o">=</span><span class="n">z</span><span class="p">,</span> <span class="n">value</span><span class="o">=</span><span class="n">z</span><span class="p">,</span> <span class="n">mask</span><span class="o">=</span><span class="n">mask</span><span class="p">)</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-35'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-35'>#</a>
            </div>
            <p>ස්වයංඅවධානය ප්රතිඵල එකතු </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">204</span>        <span class="n">x</span> <span class="o">=</span> <span class="n">x</span> <span class="o">+</span> <span class="bp">self</span><span class="o">.</span><span class="n">dropout</span><span class="p">(</span><span class="n">self_attn</span><span class="p">)</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-36'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-36'>#</a>
            </div>
            <p>පෝෂණයසඳහා සාමාන්යකරණය කරන්න </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">207</span>        <span class="n">z</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">norm_ff</span><span class="p">(</span><span class="n">x</span><span class="p">)</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-37'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-37'>#</a>
            </div>
            <p>මාරුවීමේපෝෂක-ඉදිරි ජාලය හරහා ගමන් කරන්න </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">209</span>        <span class="n">ff</span><span class="p">,</span> <span class="n">counts</span><span class="p">,</span> <span class="n">route_prob</span><span class="p">,</span> <span class="n">n_dropped</span><span class="p">,</span> <span class="n">route_prob_max</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">feed_forward</span><span class="p">(</span><span class="n">z</span><span class="p">)</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-38'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-38'>#</a>
            </div>
            <p>ප්රතිපෝෂණඉදිරි ප්රති results ල නැවත එක් කරන්න </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">211</span>        <span class="n">x</span> <span class="o">=</span> <span class="n">x</span> <span class="o">+</span> <span class="bp">self</span><span class="o">.</span><span class="n">dropout</span><span class="p">(</span><span class="n">ff</span><span class="p">)</span>
<span class="lineno">212</span>
<span class="lineno">213</span>        <span class="k">return</span> <span class="n">x</span><span class="p">,</span> <span class="n">counts</span><span class="p">,</span> <span class="n">route_prob</span><span class="p">,</span> <span class="n">n_dropped</span><span class="p">,</span> <span class="n">route_prob_max</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-39'>
        <div class='docs doc-strings'>
            <div class='section-link'>
                <a href='#section-39'>#</a>
            </div>
            <h2>ට්රාන්ස්ෆෝමර්ස්විචය</h2>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">216</span><span class="k">class</span> <span class="nc">SwitchTransformer</span><span class="p">(</span><span class="n">Module</span><span class="p">):</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-40'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-40'>#</a>
            </div>
            
        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">221</span>    <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">layer</span><span class="p">:</span> <span class="n">SwitchTransformerLayer</span><span class="p">,</span> <span class="n">n_layers</span><span class="p">:</span> <span class="nb">int</span><span class="p">):</span>
<span class="lineno">222</span>        <span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="fm">__init__</span><span class="p">()</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-41'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-41'>#</a>
            </div>
            <p>ට්රාන්ස්ෆෝමර්ස්ථරයේ පිටපත් සාදන්න </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">224</span>        <span class="bp">self</span><span class="o">.</span><span class="n">layers</span> <span class="o">=</span> <span class="n">clone_module_list</span><span class="p">(</span><span class="n">layer</span><span class="p">,</span> <span class="n">n_layers</span><span class="p">)</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-42'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-42'>#</a>
            </div>
            <p>අවසානසාමාන්යකරණ ස්තරය </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">226</span>        <span class="bp">self</span><span class="o">.</span><span class="n">norm</span> <span class="o">=</span> <span class="n">nn</span><span class="o">.</span><span class="n">LayerNorm</span><span class="p">([</span><span class="n">layer</span><span class="o">.</span><span class="n">size</span><span class="p">])</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-43'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-43'>#</a>
            </div>
            
        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">228</span>    <span class="k">def</span> <span class="nf">forward</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">x</span><span class="p">:</span> <span class="n">torch</span><span class="o">.</span><span class="n">Tensor</span><span class="p">,</span> <span class="n">mask</span><span class="p">:</span> <span class="n">torch</span><span class="o">.</span><span class="n">Tensor</span><span class="p">):</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-44'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-44'>#</a>
            </div>
            <p>එක්එක් ට්රාන්ස්ෆෝමර් ස්ථරය හරහා ධාවනය කරන්න </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">230</span>        <span class="n">counts</span><span class="p">,</span> <span class="n">route_prob</span><span class="p">,</span> <span class="n">n_dropped</span><span class="p">,</span> <span class="n">route_prob_max</span> <span class="o">=</span> <span class="p">[],</span> <span class="p">[],</span> <span class="p">[],</span> <span class="p">[]</span>
<span class="lineno">231</span>        <span class="k">for</span> <span class="n">layer</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">layers</span><span class="p">:</span>
<span class="lineno">232</span>            <span class="n">x</span><span class="p">,</span> <span class="n">f</span><span class="p">,</span> <span class="n">p</span><span class="p">,</span> <span class="n">n_d</span><span class="p">,</span> <span class="n">p_max</span> <span class="o">=</span> <span class="n">layer</span><span class="p">(</span><span class="n">x</span><span class="o">=</span><span class="n">x</span><span class="p">,</span> <span class="n">mask</span><span class="o">=</span><span class="n">mask</span><span class="p">)</span>
<span class="lineno">233</span>            <span class="n">counts</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">f</span><span class="p">)</span>
<span class="lineno">234</span>            <span class="n">route_prob</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">p</span><span class="p">)</span>
<span class="lineno">235</span>            <span class="n">n_dropped</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">n_d</span><span class="p">)</span>
<span class="lineno">236</span>            <span class="n">route_prob_max</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">p_max</span><span class="p">)</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-45'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-45'>#</a>
            </div>
            <p>අවසානවශයෙන්, දෛශික සාමාන්යකරණය කරන්න </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">238</span>        <span class="n">x</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">norm</span><span class="p">(</span><span class="n">x</span><span class="p">)</span></pre></div>
        </div>
    </div>
    <div class='section' id='section-46'>
        <div class='docs'>
            <div class='section-link'>
                <a href='#section-46'>#</a>
            </div>
            <p> </p>

        </div>
        <div class='code'>
            <div class="highlight"><pre><span class="lineno">240</span>        <span class="k">return</span> <span class="n">x</span><span class="p">,</span> <span class="n">torch</span><span class="o">.</span><span class="n">stack</span><span class="p">(</span><span class="n">counts</span><span class="p">),</span> <span class="n">torch</span><span class="o">.</span><span class="n">stack</span><span class="p">(</span><span class="n">route_prob</span><span class="p">),</span> <span class="n">n_dropped</span><span class="p">,</span> <span class="n">torch</span><span class="o">.</span><span class="n">stack</span><span class="p">(</span><span class="n">route_prob_max</span><span class="p">)</span></pre></div>
        </div>
    </div>
    <div class='footer'>
        <a href="https://papers.labml.ai">Trending Research Papers</a>
        <a href="https://labml.ai">labml.ai</a>
    </div>
</div>
<script src=../../interactive.js?v=1"></script>
<script>
    function handleImages() {
        var images = document.querySelectorAll('p>img')

        for (var i = 0; i < images.length; ++i) {
            handleImage(images[i])
        }
    }

    function handleImage(img) {
        img.parentElement.style.textAlign = 'center'

        var modal = document.createElement('div')
        modal.id = 'modal'

        var modalContent = document.createElement('div')
        modal.appendChild(modalContent)

        var modalImage = document.createElement('img')
        modalContent.appendChild(modalImage)

        var span = document.createElement('span')
        span.classList.add('close')
        span.textContent = 'x'
        modal.appendChild(span)

        img.onclick = function () {
            console.log('clicked')
            document.body.appendChild(modal)
            modalImage.src = img.src
        }

        span.onclick = function () {
            document.body.removeChild(modal)
        }
    }

    handleImages()
</script>
</body>
</html>